{ "cells": [ { "cell_type": "markdown", "source": [ "Author: Krzysztof Tołpa" ], "metadata": { "collapsed": false }, "id": "4b43d2953318a5f1" }, { "cell_type": "markdown", "source": [ "# EEG Preprocessing for an Oddball Paradigm\n", "\n", "Preprocessing is a critical step in EEG analysis. It involves cleaning the data to remove noise and artifacts — signals that are not of neural origin — which can otherwise contaminate our results. We use ***MNE-Python*** library accompanied with adapted from ***EEGLAB*** functions: ``rejchanspec`` and ``trimoutlier``. We also utilize ``mne-icalabel`` plugin for automatic classification ICA components. \n", "\n", "Our pipeline covers commonly applied steps:\n", "1. ***Loading Data***: Importing the raw EEG recording.\n", "2. ***Channel Management***: Handling channel locations, names, and types.\n", "3. ***Filtering***: Applying high-pass and low-pass filters to isolate the frequency bands of interest.\n", "4. ***Artifact Detection***: Identifying and marking bad channels and noisy time segments.\n", "5. ***Artifact Removal with ICA***: Using Independent Component Analysis (ICA) to remove stereotyped artifacts like eye blinks and muscle activity.\n", "6. ***Interpolation & Resampling***: Reconstructing bad channels and downsampling the data to make it more manageable.\n", "7. ***Epoching***: Segmenting the continuous data into trials (epochs) centered around our experimental events.\n", "\n", "Let's get started!" ], "metadata": { "collapsed": false }, "id": "ddbf796a" }, { "cell_type": "markdown", "id": "81787e01", "metadata": {}, "source": [ "## 1. Setup: Importing Necessary Libraries\n", "\n", "First things first, we need to import all the Python packages we'll use throughout the analysis.\n", "\n", "Filr with ``preprocessing_utils`` can be found on [github](https://github.com/julia-jurkowska/supFunSim-py/blob/dev/tutorials/preprocessing_utils.py).\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "b558cc0b", "metadata": { "ExecuteTime": { "end_time": "2025-09-14T00:07:40.510644Z", "start_time": "2025-09-14T00:07:38.972591Z" } }, "outputs": [], "source": [ "import mne\n", "import numpy as np\n", "from os.path import join\n", "import preprocessing_utils as utils\n", "from mne.preprocessing import ICA\n", "from mne_icalabel import label_components\n", "import matplotlib\n", "matplotlib.use('TkAgg') # so that the figures will pop up outside the cells\n", "\n", "import warnings\n", "warnings.filterwarnings('ignore', category=RuntimeWarning)" ] }, { "cell_type": "markdown", "id": "a7a0c921", "metadata": {}, "source": [ "## 2. Loading and Inspecting Raw Data\n", "\n", "Now, we'll define the file paths and load our raw data. We are working with data in the **BrainVision** format, which consists of three files (`.vhdr`, `.vmrk`, `.eeg`). MNE conveniently handles this by pointing to the header file (`.vhdr`).\n", "\n", "We set `preload=True` to load the entire dataset into memory. This is necessary for many preprocessing steps, especially filtering and ICA." ] }, { "cell_type": "code", "execution_count": 2, "id": "f57ab8a2", "metadata": { "ExecuteTime": { "end_time": "2025-09-14T00:07:45.212702Z", "start_time": "2025-09-14T00:07:43.216310Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Extracting parameters from /Volumes/UMK/oddball/subjects/sample_subject/_eeg/_raw/sample_subject_oddball.vhdr...\n", "Setting channel info structure...\n", "Reading 0 ... 1329199 = 0.000 ... 1329.199 secs...\n" ] } ], "source": [ "subjects_dir = 'data'\n", "subject = 'sample_subject'\n", "\n", "filepath = join(subjects_dir,subject,'_eeg','_raw')\n", "\n", "filename = subject+'_oddball.vhdr'\n", "\n", "raw = mne.io.read_raw_brainvision(join(filepath, filename), preload=True) \n" ] }, { "cell_type": "markdown", "id": "3faa8d9f", "metadata": {}, "source": [ "#### Visual Inspection\n", "\n", "Before diving into any processing, let's **look at the data**. This first-pass visual check can help you spot major problems like dead channels (completely flat lines), extreme electrical noise, or large movement artifacts. The `raw.plot()` function is an interactive tool that lets you scroll through your data. It can be run at any step of the preprocessing to monitor the results." ] }, { "cell_type": "code", "execution_count": 3, "id": "422591e7", "metadata": { "ExecuteTime": { "end_time": "2025-09-14T00:08:13.995411Z", "start_time": "2025-09-14T00:08:13.956704Z" } }, "outputs": [], "source": [ "# raw.plot()" ] }, { "cell_type": "markdown", "id": "308963e6", "metadata": {}, "source": [ "## 3. Channel Management\n", "\n", "Properly defining our channels is a foundational step. This involves removing non-essential channels and ensuring the remaining ones are correctly typed before we assign their 3D locations.\n", "\n", "* **Dropping Unwanted Channels**: We first remove any channels that are not EEG sensors, such as **EMG** (muscle) or **EOG** (eye movement) channels.\n", "* **Setting Channel Types**: We then ensure that all channels intended as EEG sensors are correctly labeled. Mastoid channels like `TP9` and `TP10` were assigned `'misc'` category since they were attached to the earlobes. We explicitly set them to `'eeg'`, their location will be adjusted when we load individual montage." ] }, { "cell_type": "code", "execution_count": 4, "id": "90f3164c", "metadata": { "ExecuteTime": { "end_time": "2025-09-14T00:08:16.525697Z", "start_time": "2025-09-14T00:08:15.424565Z" } }, "outputs": [], "source": [ "raw = raw.copy().drop_channels([\"EMG\", \"EOG\"])\n", " \n", "raw = raw.copy().set_channel_types({\"TP9\": \"eeg\", \"TP10\":\"eeg\"})" ] }, { "cell_type": "markdown", "id": "0da1b288", "metadata": {}, "source": [ "#### Setting the Montage (Electrode Locations)\n", "\n", "To know where our electrodes were on the participant's head, we need to apply a **montage**. A montage file contains the 3D coordinates of each sensor. This is essential for:\n", "* Topographical plots (visualizing scalp activity).\n", "* Interpolating bad channels based on their neighbors.\n", "* Source localization analyses.\n", "\n", "Here, we're loading a subject's individual `.bvct` file, which is a format from the Captrak digitizer system, and applying it to our `raw` object." ] }, { "cell_type": "code", "execution_count": 5, "id": "3f1335c1", "metadata": { "ExecuteTime": { "end_time": "2025-09-14T00:08:19.158023Z", "start_time": "2025-09-14T00:08:19.076373Z" } }, "outputs": [ { "data": { "text/plain": "", "text/html": "\n\n\n\n\n\n\n \n\n\n\n\n\n\n\n\n \n \n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n \n \n \n \n \n\n\n \n \n \n \n \n\n\n \n \n \n \n \n\n \n\n\n\n\n\n\n\n\n \n \n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n \n \n \n\n\n\n \n\n\n\n\n\n\n\n\n \n \n\n\n\n \n\n \n \n \n\n\n\n\n \n \n \n \n \n\n \n\n\n\n\n\n\n\n\n \n \n\n\n\n\n \n \n \n\n\n\n\n \n \n \n\n\n\n
\n \n \n General\n
Filename(s)\n \n sample_subject_oddball.eeg\n \n \n
MNE object typeRawBrainVision
Measurement date2018-10-31 at 11:19:57 UTC
ParticipantUnknown
ExperimenterUnknown
\n \n \n Acquisition\n
Duration00:22:10 (HH:MM:SS)
Sampling frequency1000.00 Hz
Time points1,329,200
\n \n \n Channels\n
EEG\n \n\n \n
Head & sensor digitization130 points
\n \n \n Filters\n
Highpass0.00 Hz
Lowpass280.00 Hz
" }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get channel positions\n", "\n", "montage_path = join(subjects_dir,subject,'_eeg','montage','captrak')\n", "montage_name =f'{subject}_captrak.bvct'\n", "montage = mne.channels.read_dig_captrak(join(montage_path, montage_name))\n", "raw.set_montage(montage)" ] }, { "cell_type": "markdown", "id": "a27334a0", "metadata": {}, "source": [ "## 4. Filtering and Artifact Detection\n", "\n", "Now we begin the core cleaning process. The order of operations here is important.\n", "\n", "#### High-Pass Filtering\n", "\n", "We start with a **high-pass filter** to remove slow drifts in the data. These drifts can be caused by sweat, skin potentials, or slow electrode movement. A cutoff of 1 Hz is common for ERP analysis, as it effectively removes these drifts while preserving the faster neural signals of interest." ] }, { "cell_type": "code", "execution_count": 6, "id": "278de538", "metadata": { "ExecuteTime": { "end_time": "2025-09-14T00:08:25.702240Z", "start_time": "2025-09-14T00:08:20.749988Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Filtering raw data in 1 contiguous segment\n", "Setting up high-pass filter at 1 Hz\n", "\n", "FIR filter parameters\n", "---------------------\n", "Designing a one-pass, zero-phase, non-causal highpass filter:\n", "- Windowed time-domain design (firwin) method\n", "- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation\n", "- Lower passband edge: 1.00\n", "- Lower transition bandwidth: 1.00 Hz (-6 dB cutoff frequency: 0.50 Hz)\n", "- Filter length: 3301 samples (3.301 s)\n" ] } ], "source": [ "# High-Pass Filtering (>1 Hz)\n", "raw = raw.filter(l_freq = 1, h_freq=None, method='fir', fir_window='hamming', n_jobs=-1) " ] }, { "cell_type": "markdown", "id": "c3616eb1", "metadata": {}, "source": [ "#### Automated Bad Channel and Segment Detection\n", "\n", "\n", "Next, we identify and mark bad data. This is a two-step process using custom utility functions, which are often developed in a lab to standardize analysis.\n", "\n", "1. **`rejchanspec` (Reject Channel by Spectrum)**: This function identifies channels with abnormal spectral power. The `freqlims` parameter defines the frequency bands to examine, while `stdlims` sets the corresponding standard deviation (Z-score) thresholds. For example, a channel is marked as bad if its power within a given band exceeds the specified threshold. The names of these bad channels are then added to the `raw.info['bads']` list.\n", "\n", "2. **`trimoutlier` (Trim Outliers)**: This function detects and marks time segments containing extreme voltage fluctuations, such as those caused by subject movement. The `amplitude_threshold` parameter sets the absolute voltage value (e.g., 500 µV) above which a data point is considered an artifact. The `point_spread_width` parameter then marks a time window (in milliseconds) around any detected artifact as \"bad.\" This padding ensures that the artifact's onset and offset are also fully excluded from subsequent analyses. The function creates **annotations** that label these identified periods as \"bad.\" MNE functions can then be instructed to ignore these segments during later steps like ICA." ] }, { "cell_type": "code", "execution_count": 7, "id": "b8b5ac10", "metadata": { "ExecuteTime": { "end_time": "2025-09-14T00:08:31.289281Z", "start_time": "2025-09-14T00:08:25.717770Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Removed EEG channels: ['FCC1h', 'Fz', 'FC1', 'FCC2h', 'Cz', 'FFC1h', 'FFC2h']\n" ] } ], "source": [ "# Remove Bad Channels Using REJCHANSPEC\n", "freqlims = [[0.00, 5.00], \n", " [5.00, 40.00]]\n", "stdlims = [[-5.00, 5.00], \n", " [-2.50, 2.50]]\n", "raw, specdata = utils.rejchanspec(raw.copy(), freqlims, stdlims)" ] }, { "cell_type": "code", "execution_count": 8, "id": "50f4e29e", "metadata": { "ExecuteTime": { "end_time": "2025-09-14T00:08:36.965139Z", "start_time": "2025-09-14T00:08:31.292878Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Selected 120 EEG channels (excluding bads)\n", "No channel removed.\n", "\n", "0.0005uV threshold with 2000ms spreading rejected 20.4 sec data, added 9 boundaries.\n", "trimOutlier done. The masks for clean channels and data points are stored in the raw object.\n" ] } ], "source": [ "# Remove Bad Samples Using TRIMOUTLIER\n", "\n", "amplitude_threshold = 500*1e-6\n", "point_spread_width = 2000 # ms\n", "\n", "raw = utils.trimoutlier(raw, amplitude_threshold, point_spread_width)" ] }, { "cell_type": "markdown", "id": "ec4b4342", "metadata": {}, "source": [ "#### Low-Pass Filtering (for ICA)\n", "\n", "Before running ICA, it's good practice to **low-pass filter** the data. High-frequency noise (like muscle activity) can sometimes dominate the signal and prevent ICA from finding a good solution. We apply a 100 Hz low-pass filter here to clean up the signal specifically for the ICA decomposition. We will apply a stricter low-pass filter later for our final analysis." ] }, { "cell_type": "code", "execution_count": 9, "id": "486a882a", "metadata": { "ExecuteTime": { "end_time": "2025-09-14T00:08:42.408678Z", "start_time": "2025-09-14T00:08:37.019604Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Filtering raw data in 1 contiguous segment\n", "Setting up low-pass filter at 1e+02 Hz\n", "\n", "FIR filter parameters\n", "---------------------\n", "Designing a one-pass, zero-phase, non-causal lowpass filter:\n", "- Windowed time-domain design (firwin) method\n", "- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation\n", "- Upper passband edge: 100.00 Hz\n", "- Upper transition bandwidth: 25.00 Hz (-6 dB cutoff frequency: 112.50 Hz)\n", "- Filter length: 133 samples (0.133 s)\n" ] } ], "source": [ "# Low-Pass Filtering (<100 Hz)\n", "raw = raw.filter(l_freq = None, h_freq=100, method='fir', fir_window='hamming', n_jobs=-1) " ] }, { "cell_type": "markdown", "id": "e8f99079", "metadata": {}, "source": [ "## 5. Rereferencing\n", "\n", "The voltage at any one electrode is measured relative to a reference. Here, the data was recorded with `FCz` as a reference electrode. We can bring it back to the analysis and rereference to the **average of all EEG channels**. " ] }, { "cell_type": "code", "execution_count": 10, "id": "d3895a1d", "metadata": { "ExecuteTime": { "end_time": "2025-09-14T00:08:43.676258Z", "start_time": "2025-09-14T00:08:42.412525Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "EEG channel type selected for re-referencing\n", "Applying average reference.\n", "Applying a custom ('EEG',) reference.\n" ] } ], "source": [ "# Bring back FCz channel\n", "raw = mne.add_reference_channels(raw, 'FCz')\n", "raw.set_montage(montage)\n", " \n", "# Re-reference to average reference\n", "raw = raw.set_eeg_reference(ref_channels=\"average\")" ] }, { "cell_type": "markdown", "id": "367f3444", "metadata": {}, "source": [ "## 6. Independent Component Analysis (ICA) for Artifact Removal\n", "\n", "**ICA** is a technique for separating statistically independent signals that have been mixed together. In EEG, it's effective at isolating stereotyped artifacts like:\n", "* Eye blinks and eye movements\n", "* Heartbeat signals (ECG)\n", "* Persistent muscle tension\n", "\n", "The process is:\n", "1. **Fit ICA**: We decompose the EEG signal into a set of independent components (ICs). We do this only on the \"good\" data segments, using `reject_by_annotation=True`.\n", "2. **Label Components**: We use the `mne-icalabel` library to automatically classify each IC into categories like brain, eye blink, muscle, etc. For the best performance, **`ICLabel` requires data that is average-referenced, filtered between 1-100 Hz, and decomposed using an `extended infomax` ICA**. We have explicitly followed these steps in our pipeline, which justifies our earlier preprocessing choices and the ICA settings we are about to use. \n", "3. **Remove Artifactual Components**: We identify the ICs that were confidently labeled as artifacts (e.g., \"eye blink\") and instruct MNE to remove them. " ] }, { "cell_type": "code", "execution_count": 11, "id": "4da3eb51", "metadata": { "ExecuteTime": { "end_time": "2025-09-14T00:39:14.461194Z", "start_time": "2025-09-14T00:08:43.678244Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Omitting 20454 of 1329200 (1.54%) samples, retaining 1308746 (98.46%) samples.\n", "Fitting ICA to data using 121 channels (please be patient, this may take a while)\n", "Omitting 20454 of 1329200 (1.54%) samples, retaining 1308746 (98.46%) samples.\n", "Selecting by number: 120 components\n", "Computing Extended Infomax ICA\n", "Fitting ICA took 1790.4s.\n" ] } ], "source": [ "# ICA\n", "chans_to_pick = [chan for chan in raw.info['ch_names'] if chan not in raw.info['bads']]\n", "rank = np.linalg.matrix_rank(raw.get_data(reject_by_annotation='omit', picks=chans_to_pick).T)\n", "\n", "ica = ICA(\n", " n_components=rank,\n", " max_iter=\"auto\",\n", " method=\"infomax\",\n", " random_state=692,\n", " fit_params=dict(extended=True),\n", ")\n", "\n", "ica.fit(raw, reject=None, reject_by_annotation=True)\n", "\n", "# Classify components using mne-iclabel\n", "ic_labels = label_components(raw, ica, method=\"iclabel\") " ] }, { "cell_type": "markdown", "id": "bcf89b90", "metadata": {}, "source": [ "We can visually inspect identified components. \n", "The code below first selects the components that were labeled as \"eye blink\" with high confidence (over 75%). Then, it generates two types of plots for our inspection:\n", "* `plot_properties()`: A detailed summary, including the component's ***scalp topography*** (which should be frontal for blinks) and frequency spectrum. \n", "* `plot_sources()`: The component's ***activity over time***, which should show the characteristic sharp peaks of eye blinks.\n" ] }, { "cell_type": "code", "execution_count": 12, "id": "67ee2664", "metadata": { "ExecuteTime": { "end_time": "2025-09-14T00:39:14.521430Z", "start_time": "2025-09-14T00:39:14.520520Z" } }, "outputs": [], "source": [ "# ## OPTIONAL: inspect conponents \n", "# idxs = [idx for idx, (label, proba) in enumerate(zip(ic_labels[\"labels\"],ic_labels['y_pred_proba'])) if label in [\"eye blink\"] and proba > 0.75]\n", "# ica.plot_properties(raw, picks=idxs, verbose=False)\n", "# ica.plot_sources(raw, picks=idxs)\n" ] }, { "cell_type": "markdown", "id": "0dc2856f", "metadata": {}, "source": [ "The following code creates a list of component indices to `exclude_idx`. It identifies any component whose label is *not* `\"brain\"` or `\"other\"` and has a prediction probability greater than ****75%****. Finally, `ica.apply()` removes the activity from these selected components, cleaning the underlying EEG data." ] }, { "cell_type": "code", "execution_count": 13, "id": "bb1e2d3f", "metadata": { "ExecuteTime": { "end_time": "2025-09-14T00:39:15.661564Z", "start_time": "2025-09-14T00:39:14.522745Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Applying ICA to Raw instance\n", " Transforming to ICA space (120 components)\n", " Zeroing out 34 ICA components\n", " Projecting back using 121 PCA components\n" ] }, { "data": { "text/plain": "", "text/html": "\n\n\n\n\n\n\n \n\n\n\n\n\n\n\n\n \n \n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n \n \n \n \n \n\n\n \n \n \n \n \n\n\n \n \n \n \n \n\n \n\n\n\n\n\n\n\n\n \n \n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n \n \n \n\n\n\n \n\n\n\n\n\n\n\n\n \n \n\n\n\n \n\n \n \n \n\n\n\n\n \n \n \n \n \n\n \n\n\n\n\n\n\n\n\n \n \n\n\n\n\n \n \n \n\n\n\n\n \n \n \n\n\n\n
\n \n \n General\n
Filename(s)\n \n sample_subject_oddball.eeg\n \n \n
MNE object typeRawBrainVision
Measurement date2018-10-31 at 11:19:57 UTC
ParticipantUnknown
ExperimenterUnknown
\n \n \n Acquisition\n
Duration00:22:10 (HH:MM:SS)
Sampling frequency1000.00 Hz
Time points1,329,200
\n \n \n Channels\n
EEG\n \n\n \n \n and \n \n
Head & sensor digitization131 points
\n \n \n Filters\n
Highpass1.00 Hz
Lowpass100.00 Hz
" }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\n", "labels = ic_labels[\"labels\"] # predicted labels\n", "probas = ic_labels['y_pred_proba'] # probabilities for each component to be the labelled class\n", "\n", "# exclude components if the labelled component is not brain or other and the corresponding probability of artifact component is at least 0.75\n", "exclude_idx = [\n", " idx for idx, (label, proba) in enumerate(zip(labels, probas))\n", " if label not in [\"brain\", \"other\"] and proba > 0.75\n", "]\n", "\n", "ica.apply(raw, exclude=exclude_idx)" ] }, { "cell_type": "markdown", "id": "d3ad6ea7", "metadata": {}, "source": [ "## 7. Final Processing Steps\n", "\n", "With the major artifacts removed, we can perform the final steps to get the data ready for epoching.\n", "\n", "#### Final Low-Pass Filter\n", "\n", "We now apply a stricter **low-pass filter**, typically around 40 Hz is enough for ERP analysis." ] }, { "cell_type": "code", "execution_count": 14, "id": "f75a2700", "metadata": { "ExecuteTime": { "end_time": "2025-09-14T00:39:18.407509Z", "start_time": "2025-09-14T00:39:15.660930Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Filtering raw data in 1 contiguous segment\n", "Setting up low-pass filter at 40 Hz\n", "\n", "FIR filter parameters\n", "---------------------\n", "Designing a one-pass, zero-phase, non-causal lowpass filter:\n", "- Windowed time-domain design (firwin) method\n", "- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation\n", "- Upper passband edge: 40.00 Hz\n", "- Upper transition bandwidth: 10.00 Hz (-6 dB cutoff frequency: 45.00 Hz)\n", "- Filter length: 331 samples (0.331 s)\n" ] } ], "source": [ "# Low-Pass Filtering (<40 Hz)\n", "raw = raw.filter(l_freq = None, h_freq=40, method='fir', fir_window='hamming', n_jobs=-1) " ] }, { "cell_type": "markdown", "id": "54e6a091", "metadata": {}, "source": [ "#### Interpolate Bad Channels and Resample\n", "\n", "Two final steps:\n", "1. **Interpolation**: Now that the data is clean, we can fill in the data for the channels we marked as \"bad\" at the beginning. The `interpolate_bads()` function uses data from neighboring channels to estimate the signal at the bad locations. We then `reset_bads=True` as the channels have now been repaired.\n", "2. **Resampling**: Our original sampling rate was 1000 Hz. This is often higher than needed for ERP analysis. We **downsample** the data to 256 Hz. This significantly reduces the file size and computational load for subsequent steps, without losing important information (thanks to the Nyquist theorem and our 40 Hz low-pass filter)." ] }, { "cell_type": "code", "execution_count": 15, "id": "5e9dc3bb", "metadata": { "ExecuteTime": { "end_time": "2025-09-14T00:39:25.777051Z", "start_time": "2025-09-14T00:39:18.407875Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Setting channel interpolation method to {'eeg': 'spline'}.\n", "Interpolating bad channels.\n", " Automatic origin fit: head of radius 99.6 mm\n", "Computing interpolation matrix from 121 sensor positions\n", "Interpolating 7 sensors\n" ] } ], "source": [ "# Interpolate Bad Channels\n", "raw = raw.interpolate_bads(reset_bads=True, mode='spline')\n", "\n", "# Downsample to 256 Hz\n", "raw = raw.resample(sfreq=256)\n" ] }, { "cell_type": "markdown", "id": "260b4260", "metadata": {}, "source": [ "#### Final Check for Artifacts and Saving\n", "\n", "As a final quality control check, we can run `trimoutlier` with a stricter threshold to catch any remaining artifacts. Finally, it's excellent practice to **save the preprocessed continuous data**. We save it in MNE's native `.fif` format." ] }, { "cell_type": "code", "execution_count": 16, "id": "10c4036e", "metadata": { "ExecuteTime": { "end_time": "2025-09-14T00:39:26.288878Z", "start_time": "2025-09-14T00:39:25.778989Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Selected 128 EEG channels (excluding bads)\n", "No channel removed.\n", "\n", "9.999999999999999e-05uV threshold with 1000ms spreading rejected 4.3 sec data, added 4 boundaries.\n", "trimOutlier done. The masks for clean channels and data points are stored in the raw object.\n" ] } ], "source": [ "# Final checks for Bad Samples Using TRIMOUTLIER\n", "channel_sd_lower_bound = -np.inf\n", "channel_sd_upper_bound = np.inf\n", "amplitude_threshold = 100*1e-6\n", "point_spread_width = 1000 # 2000\n", "\n", "raw = utils.trimoutlier(raw, amplitude_threshold, point_spread_width)" ] }, { "cell_type": "markdown", "id": "cf39c9f8", "metadata": {}, "source": [ "Let's inspect the results of the preprocessing by plotting raw object data." ] }, { "cell_type": "code", "execution_count": 17, "id": "07fb10f5", "metadata": { "ExecuteTime": { "end_time": "2025-09-14T00:39:26.292234Z", "start_time": "2025-09-14T00:39:26.289459Z" } }, "outputs": [], "source": [ "# raw.plot()" ] }, { "cell_type": "markdown", "id": "48a8656e", "metadata": {}, "source": [ "Upon closer inspection, we can still see some smaller, transient artifacts that were not caught by the previous threshold (for example: muscle artifacts around 545s or 648s or increased slow wave activity related to tiredness around 457s). To ensure our data is as clean as possible, we will run the `trimoutlier` function one more time with a stricter rejection threshold." ] }, { "cell_type": "code", "execution_count": 18, "id": "beb9d079", "metadata": { "ExecuteTime": { "end_time": "2025-09-14T00:39:26.804308Z", "start_time": "2025-09-14T00:39:26.293412Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Selected 128 EEG channels (excluding bads)\n", "No channel removed.\n", "\n", "4.9999999999999996e-05uV threshold with 1000ms spreading rejected 45.1 sec data, added 36 boundaries.\n", "trimOutlier done. The masks for clean channels and data points are stored in the raw object.\n" ] } ], "source": [ "amplitude_threshold = 50*1e-6\n", "point_spread_width = 1000 # 2000\n", "\n", "raw = utils.trimoutlier(raw, amplitude_threshold, point_spread_width)" ] }, { "cell_type": "code", "execution_count": 19, "id": "6e72e075", "metadata": { "ExecuteTime": { "end_time": "2025-09-14T00:39:26.807794Z", "start_time": "2025-09-14T00:39:26.804967Z" } }, "outputs": [], "source": [ "# Save the raw object data\n", "filepath_save = join(subjects_dir, subject,'_eeg','_pre')\n", "#raw.save(join(filepath_save, filename.replace('.vhdr', '_preprocessed_raw.fif')), overwrite=True)" ] }, { "cell_type": "markdown", "id": "f490d13b", "metadata": {}, "source": [ "## 8. Epoching and Baseline Correction\n", "\n", "The final step is to segment the continuous data into **epochs** (or trials) time-locked to the events of interest in our oddball paradigm.\n", "\n", "* **Extract Events**: We first parse the event markers (annotations) from the raw data, mapping the trigger codes (e.g., 'S 5') to meaningful condition names (`standard`, `deviant`, `target`).\n", "* **Create Epochs**: We then slice the data from -200 ms before each event to 800 ms after.\n", "* **Baseline Correction**: For each epoch, we calculate the average signal in the pre-stimulus period (-200 ms to 0 ms) and subtract this average from the entire epoch. This ensures that any differences we see after the stimulus are not due to random fluctuations from before the stimulus appeared.\n", "* **Drop Bad Epochs**: Any epochs that overlap with time periods we previously marked as \"bad\" are automatically discarded using `reject_by_annotation=True`.\n", "\n", "The result is an `Epochs` object, which is the final, analysis-ready data structure. We save this object to a file for the next stage of analysis." ] }, { "cell_type": "code", "execution_count": 20, "id": "aff8b8d2", "metadata": { "ExecuteTime": { "end_time": "2025-09-14T00:39:27.783539Z", "start_time": "2025-09-14T00:39:26.809191Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Used Annotations descriptions: [np.str_('Stimulus/S 5'), np.str_('Stimulus/S 6'), np.str_('Stimulus/S 7')]\n", "Not setting metadata\n", "660 matching events found\n", "Setting baseline interval to [-0.19921875, 0.0] s\n", "Applying baseline correction (mode: mean)\n", "Using data from preloaded Raw for 660 events and 257 original time points ...\n", "39 bad epochs dropped\n" ] } ], "source": [ "# Divide data into epochs with DC detrend and -0.2 - 0s baseline correction\n", "all_events, all_event_id = mne.events_from_annotations(raw, event_id={'Stimulus/S 5': 3, 'Stimulus/S 6': 4,\n", " 'Stimulus/S 7': 5})\n", "\n", "epoched = mne.Epochs(raw, all_events, event_id=dict(standard=3, deviant=4, target=5), tmin=-0.2, tmax=0.8,\n", " baseline=(None, 0), reject_by_annotation=True, detrend=0, proj=False,\n", " reject=None, preload=True)\n", "epoched.drop_bad()" ] }, { "cell_type": "code", "execution_count": null, "id": "0f72e805", "metadata": {}, "outputs": [], "source": [ "epoched.save(join(filepath_save, filename.replace('.vhdr', '_epo.fif')), overwrite=True)" ] } ], "metadata": { "kernelspec": { "name": "mvpure-tools-venv", "language": "python", "display_name": "mvpure-tools-venv" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.18" } }, "nbformat": 4, "nbformat_minor": 5 }